Measuring the Semantic Distance between Languages from a Statistical Analysis of Bilingual Dictionaries

نویسنده

  • Martin C. Cooper
چکیده

A bilingual dictionary is a valuable linguistic resource which records, among other things, the di erences in the segmentation of semantic space by the two languages and hence the di culty in producing faithful translations between the two languages. Statistical analysis of nearly a hundred dictionaries has allowed us to determine how best to measure the semantic distance between languages from bilingual dictionaries. The distribution of the number of words in language A having n translations in language B, for n=1,2,3, etc., was found to have a speci c shape depending on the semantic distance between the two languages. A sample of only a thousand words was su cient to obtain an estimate of semantic distance. We give a theoretical justi cation for this distance based on models of the historical evolution of monolingual and bilingual dictionaries. Among our linguistic ndings, we discovered, for example, that French is semantically closer to Basque than to German. We envisage an application of our semantic distance measure in the choice of an intermediate language when performing indirect translation, i.e. translating from language A to language B via a third language C.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On multiword lexical units and their role in maritime dictionaries

Multi-word lexical units are a typical feature of specialized dictionaries, in particular monolingual and bilingual maritime dictionaries. The paper studies the concept of the multi-word lexical unit and considers the similarities and differences of their selection and presentation in monolingual and bilingual maritime dictionaries. The work analyses such issues as the classification of multi-w...

متن کامل

Generating Cross-lingual Concept Space from Parallel Corpora on the Web

The information available in languages other than English on the World Wide Web is increasing significantly. To cross language boundaries between different languages, dictionaries are the most typical tools. However, the general-purpose dictionary is less sensitive in genre and domain and it is impractical to manually construct tailored bilingual dictionaries or sophisticated multilingual thesa...

متن کامل

Pivot-Based Bilingual Dictionary Extraction from Multiple Dictionary Resources

High quality bilingual dictionaries are rarely available for lower-density language pairs, especially for those that are closely related. Using a third language as a pivot to link two other languages is a wellknown solution, and usually requires only two input bilingual dictionaries to automatically induce the new one. This approach, however, produces many incorrect translation pairs because th...

متن کامل

The apertium bilingual dictionaries on the web of data

Bilingual electronic dictionaries contain collections of lexical entries in two languages, with explicitly declared translation relations between such entries. Nevertheless, they are typically developed in isolation, in their own formats and accessible through proprietary APIs. In this paper we propose the use of Semantic Web techniques to make translations available on the Web to be consumed b...

متن کامل

Mapping Words Between Slovak Text and its Translation to English

Word alignment in texts translated to different languages is used in various applications such as cross-language information retrieval. To search for equivalent words in text translations various statistical methods, methods based on position of words in phrases and methods based on bilingual dictionaries are used. However it is very difficult to use these methods in languages with big morpholo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of Quantitative Linguistics

دوره 15  شماره 

صفحات  -

تاریخ انتشار 2008